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Whereas the embedding distortion, the payload and the robustness of digital watermarking schemes 
are well understood, the notion of security is still not completely well defined. The approach proposed 
in the last five years is too theoretical and solely considers the embedding process, which is half of 
the watermarking scheme. This paper proposes a new measurement of watermarking security, called the 
effective key length, which captures the difficulty for the adversary to get access to the watermarking 
channel. This new methodology is applied to additive spread spectrum schemes where theoretical and 
practical computations of the effective key length are proposed. It shows that these schemes are not 
secure as soon as the adversary gets observations in the Known Message Attack context. 
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I. Introduction 

From the early beginning of its history, watermarking has been characterized by a trade-off between the 
embedding distortion and the capacity. The embedding distortion counts how hiding messages degrades 
the host contents. The capacity is the theoretical amount of hidden data that can be reliably transmitted 
when facing an attack of a given strength. In practice, the operating point of a watermarking technique is 
defined by the embedding distortion, the payload, and the robustness. These are well defined and gauged, 
for instance, by a Document to Watermarking power Ratio DWR, a number of bits per host samples, and 
a Symbol Error Rate SER at a given Watermark to Noise power Ratio WNR. 

Security came as a fourth feature stemming from applications where these exist attackers willing to 
circumvent watermarking such as copy and/or copyright protection. The efforts of the pioneering works 
introducing this new concept first focused on stressing the distinction between security and robustness. 
An early definition of security was coined by Ton Kalker as the inability by unauthorized users to have 
access to the raw watermarking channel [1]. 

The problem addressed in this paper is the following: the methodology to assess the security levels 
of watermarking schemes, proposed in [2], [3], [4], [5], [6], poorly captures T. Kalker's definition. In a 
nutshell, the methodology proposed in these papers is based on C. E. Shannon's definition of security 
for symmetric crypto-systems [7]. The security level is defined as the amount of uncertainty the attacker 
has about the secret key. This is measured by the equivocation which is the entropy of the key knowing 
some observations such as contents watermarked with the same technique and the same secret key. 

Section II-A presents this past approach in more details and shows a surprising fact: this methodology 
only takes into account the embedding side. How could it capture the 'access to raw watermarking channel' 
in Kalker's definition if just half of the scheme is considered? Obviously, the decoding process should also 
play a role. Translating the theoretical foundations of cryptography security of [7] in watermarking terms 
may not have been a good idea. Indeed, watermarking and symmetric cryptography strongly disagree in 
the following point: In symmetric cryptography, the deciphering key is unique and is the ciphering key. 
Therefore, inferring this key from the observations (here, say some cipher texts) is the main task of the 
attacker. The disclosure of this key grants the adversary the access to the crypto-channel. In watermarking, 
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several keys indeed can reliably decode hidden messages. Therefore, the precise disclosure of the secret 
key used at the embedding side is a possible way to get access to the watermarking channel, but it may 
not be the only one. 

As a solution, this article proposes an alternative methodology to assess the security level of a 
watermarking scheme as detailed in Sect. II-B. In brief, our approach is based on the probability P that 
the adversary finds a key that grants him the access to the watermarking channel as wished by Kalker: 
either a key decoding hidden messages embedded with the true secret key, either a key embedding 
messages that will be decoded with the true secret key. This gives birth to the concept of equivalent 
keys presented in Sect. III. Our new definition of the security level is called the effective key length 
and is quantified by £ = -log 2 (-P) in bits. This transposes the notion of cryptographic key length to 
watermarking: the bigger the effective key length, the smaller the probability of finding an equivalent 
key. This alternative methodology equally takes into account the embedding and the decoding sides. It 
is also simpler because it is not based on information theoretical notions and it allows to evaluate the 
effective key length experimentally (see Sect. V). 

The contributions of the paper are the following: 

. A new methodology to estimate the security levels of watermarking schemes based on the definition 
of equivalent keys, the probability of finding such an equivalent key, and its translation in bits 
(Sect. III). 

. The application of this methodology to the Spread Spectrum (SS) watermarking scheme giving close 

form expressions of the effective key length in Sect. IV. 
• An experimental setup of Sect. V for estimating the effective key length with a comparison to the 

previous theoretical expressions. 
. The comparison of SS and ISS (Improved Spread Spectrum) watermarking techniques given in 

Sect. VI. 

. The definitive evidence that these watermarking schemes have low security levels as soon as the 
adversary can get observations. 

II. Watermarking security 

This section details the methodology proposed so far to evaluate the security levels of watermarking 
schemes, and then it reviews our proposal. 
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A. The past approach 

We model the host by a vector x in set X extracted from a block of content. Given a secret key k, 
the embedding modifies this signal into vector y to hide message m: y = e(x, m,k). The secret key is 
usually a signal: In spread spectrum schemes [8], the secret key is the set of carriers; in Quantization 
Index Modulation schemes [9], [10], it is the dither randomizing the quantization. This signal is usually 
generated at the embedding and decoding sides thanks to a pseudo-random generated fed by a seed. 
However, the attacker has no interest in disclosing this seed, because, by analyzing watermarked contents, 
it is usually simpler to directly estimate k without knowing this seed. 

The attacker may disclose different kinds of information about the secret key. First, he might get no 
information at all. This has been qualified as perfect covering in [2] or stego-security in [5]. This happens 
when there is a total lack of identificability of the secret key. A partial lack of identificability stems in 
different classes of security where the attacker only learns that the secret key lies in a given subset. For 
instance, in a spread spectrum scheme, he may learn that the watermark is added in a given subspace, 
however he may not identify the secret carriers up to a rotation matrix in this subspace. This is defined 
as subspace security in [5]. 

The application of the information theoretic approach of C. E. Shannon allowed to quantify water- 
marking security levels [2], [6], [3], [4]. This theory regards the signals used at the embedding as random 
variables (r.v.). Let us denote K the r.v. associated to the secret key, K, the space of the secret keys, X 
the r.v. associated to the host, X the space of the hosts. Before producing any watermarked content, the 
designer draws the secret key k according to a given distribution j>k- The adversary knows K, and pk 
but he doesn't know the instantiation k. This lack of knowledge is measured in bits by the entropy of 
the key H(K.) = - f^p^k) log 2 PK(k) (i.e., an integral if K is a continuous r.v. or a sum if K is a 
discrete r.v). 

Now, suppose the adversary sees N a observations denoted as O^ = {Oi,...,Ojv }- The question is 
whether this key will remain a secret once the attacker gets these observations. These include at least 
some watermarked contents which have been produced by the same embedder (same algorithm e(-), 
same secret key k). These are also regarded as r.v. Y. The observations may also encompass some other 
data depending on the attack setup (see definitions of WOA, KMA, KOA in [2]). 

By carefully analyzing these observations, the attacker might deduce some information about the 
secret key. The adversary can refine his knowledge about the key by constructing a posteriori distribution 
p-K(k\0 N °). The information leakage is given by the mutual information between the secret key and 
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the observations I(J£;0 No ), and the equivocation h e (N Q ) = ^(KIO^ ) determines how this leakage 
decreases the initial lack of information: h e (N D ) = H(K) - I(K;O n °). The equivocation is always a 
non increasing function. Three things needs to be known to compute these quantities: the distribution 
of the keys px_, the distribution of the host signals px and the embedding equation e(-). With this 
formulation, a perfect covering is tantamount to /(KjO^ ) = 0. Yet, for most of the watermarking 
schemes, the information leakage is not null. If identificability is granted, the equivocation about the 
secret key decreases down to (K is a discrete r.v.) or -oo (K is a continuous r.v.) as the adversary 
keeps on observing more data. This information theoretic framework to assess watermarking security 
has been applied to popular watermarking schemes such as additive Spread- Spectrum (SS) [6], [3], or 
DC-QIM (Distortion Compensated Quantization Index Modulation) [4], [11]. 

This framework is fruitful to establish if a watermarking scheme is perfectly secure and, if not, to 
compare the information leakage of different systems. Nevertheless, it brings little information regarding 
T. Kalker's basic definition of security, e.g. the ability of the adversary to have access to the watermarking 
channel. Indeed, this methodology only needs px, Pk and e(-) to derive the distribution of the observations 
and, in the end, the equivocation. The decoding side is not taken into account. Yet, in practice, the 
estimation of the secret key is only an intermediate goal for the adversary. The equivocation above 
defined can be linked to the accuracy of this estimation. However, very few works studied the impact of 
the estimation accuracy on the ability of an unauthorized access to the watermarking channel. 

B. Our proposal 

If we look at symmetric cryptography, the security is in direct relationship with the length of the secret 
key. The key length £ in bits defines the number of possible secret keys as binary words of £ bits. The 
key length provides the maximum number of tests in logarithmic scale of the brute force attack which 
finds the key by scanning the |/C| potential keys [12]. The stopping condition has little importance. One 
often assumes that the adversary tests keys until decoded messages are meaningful. We can also rephrase 
this with probability: If the adversary draws a key uniformly, the probability to pick the secret key is 
P = 2~ e , or in logarithmic scale -log 2 (P) = £ bits. With the help of some observations, the goal of the 
cryptanalysts is to find attacks requiring less operations than the brute force attack. A good cryptosystem 
has a security close to their key length and observing cipher texts is almost useless. For instance, the 
best attack so far on one version of the Advanced Encryption Standard using 128 bits secret key offers 
a computational complexity of 2 1261 [13]. Studying security within a probabilistic framework has also 
been done in other fields of cryptography (for instance, in authentication [14]). 
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Our idea is to transpose the notion of key length to watermarking. A crude try is to take the size of 
the seed of the pseudo-random generator as it is the maximum number of tests of a brute force attack 
scanning all the seeds. Yet, it doesn't take into account how the secret key is derived from the seed. 
Another though would be to take the dimension of the space K,, but again, it does not consider how 
watermarking uses the secret key. We think that the best approach relies on a probabilistic framework 
and on the fact that, in watermarking, the secret key may not be unique in some sense. Denote by m 
the message decoded from y with the secret key k: rh = d(y,k). We expect that rh-m, but this might 
be the case for another decoding key k'. This raises the concept of equivalent keys: for instance, k' is 
equivalent to the secret key k if it grants the decoding of almost all contents watermarked with k. This 
idea was first mentioned in [15], where the authors made the first distinction between the key lengths 
in cryptography and watermarking. The fact that the decoding key might not be unique creates a big 
distinction with cryptography. However, the rationale of the brute force attack still holds. The attacker 
proposes a test key k' and we assume there is a genie telling him whether k' is equivalent to k. In other 
words, the security of a scheme does not rely on the difficulty of knowing whether k' is an equivalent 
key, but on the rarity of such keys: The lower the probability P of k' being equivalent to k, the more 
secure is the scheme. We propose to define the effective key length as a logarithmic measure of this 
probability. Note that in our proposal, we must pay attention to the decoding algorithm d(-) because it 
is central to the definition of equivalent keys. 

Like in the previous methodology, the attack setup (WOA, KMA, KOA) determines the data from 
which the test key is derived. In this paper, we restrict our attention to the Known Message Attack 
(KMA - an observation is a pair of a watermarked content and the embedded message: Oj = {yi,m,i}). 

Assessing the security of watermarking within a probabilistic framework is not new. S. Katzenbeisser 
has also listed the drawbacks of the information theoretic past approach [16]. He especially outlined the 
lack of assumption on the computing power of the attacker. He then proposed to gauge security as the 
advantage of the attacker. In a first step, the adversary, modeled by a probabilistic polynomial-time Turing 
machine, observes contents watermarked with the secret key ki or k2. Then, the designer produces a 
new piece of content y and challenges the adversary whether y has been watermarked with key ki or 
k2. The advantage is defined as the probability of a right guess minus 1/2. One clearly sees that a strictly 
positive advantage implies that the adversary has been able to infer some information about the secret 
key during the first step. However, the relationship with its ability to access the watermarking channel is 
not straightforward: the decoding is not considered, and the notion of equivalent keys is missing. 
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III. Definition of the effective key length 

This section explains the concept of equivalent keys necessary to define the effective key length. We 
define by V m (k) c X the decoding region associated to the message m and for the key k by: 



The topology and location of this region in X depends of the decoding algorithm and of k. 

To hide message m, the encoder pushes the host vector x deep inside V m (k), and this creates an 
embedding region £ m (k) c X: 



A watermarking scheme provides robustness by embedding in such a way that the watermarked contents 
are located far away from the boundary of the decoding region. If the vector extracted from an attacked 
content z = y + n goes out of £ m (k), z might still be in £> m (k) and the correct message is decoded. For 
some watermarking schemes (like QIM), we have £ m (k) c V m (k). Therefore, there might exist another 
key k' such that £ m (k') <= £> m (k). A graphical illustration of this phenomenon is depicted on Fig. 1. 
However, in general even if there is no noise, £ m (k) P m (k), and we define the Symbol Error Rate 
(SER) in the noiseless case as r/(0) = P [d(e(X, M, k), k) * M\ Capital letters X and M explicit the 
fact that the probability is over two r.v.: the host and the message to be embedded. 



Fig. 1. Graphical representation in space X of three decoding regions D m (k), D m (k') and D m (k") and the embedding 
region £ m (k, 0): the key k' belongs the equivalent decoding region /C^(k, 0) which is not the case for k". 



V m (k) = {y€X:d(y,k)=m}. 



(1) 



£ m (k) = {y e X : 3x e X s.t. y = e(x, m, k)}. 



(2) 
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We now define the equivalent keys and the associated equivalent region. We make the distinction 
between the equivalent decoding keys (the equivalent decoding region) and the equivalent embedding 
keys (resp. the equivalent embedding region). 

The set of equivalent decoding keys /C^P(k, e) c JC with < e is the set of keys that allows a decoding 
of the hidden messages embedded with k with a probability bigger than 1 - e: 

/Cg } (k, e) = {k' e JC ■ P [d(e(X, M, k), k') * M] < e}. (3) 

In the same way, the set of equivalent encoding keys JCiq\\s.,e) c JC is the set of keys that allow to 
embed messages which are reliably decoded with key k: 

/Cg } (k, e) = {k' e JC : P [d(e(X, M, k'), k) * M] < e}. (4) 

These sets are not empty for e > 77(0) since k is then an element. One expects that, for a sound design, 
these sets are empty for e < rj(0). Note that for e = 0, these two definitions are equivalent to: 

/Cg)(k,0) = {k' 6 JC : £ m (k') c © m (k)}, (5) 

and 

/Cg>(k, 0) = {k' e /C : £ m (k) c © m (k')}. (6) 




Fig. 2. Graphical representation of the key space K, and the equivalent region /C e9 (k). The dotted boundary represents the 
support of the generative function g(0 N °) which is used to draw new keys when the adversary get observations. 

The effective key length of a watermarking scheme is now defined using theses definitions. The 
adversary draws a key k' e JC taking into account the set of observations O n ° with a generative function 
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K' = g(0 N °). The function g(-) is either deterministic or stochastic (such that K' ~ ^kjO^ ) for 
instance). A graphical example of the key space K. and the equivalent region /C e? (k) is depicted on 
Fig. 2 together with the support region of a potential generative function. 

The probability P( d \e,N ) (or (e, N Q )) that the adversary picks up a key belonging to the 
equivalent decoding region (resp. equivalent embedding region) is: 

( e , n o ) = E K [E «o [E k < [K' e icif (K, e)|0"-]]], (7) 

and similarly for P^ (e, N Q ). Finally, by analogy with cryptography, the effective key length translates 
this probability into bits as follows: 

^ d \e,N ) = -\og 2 (P^(e,N )) bits, (8) 

and similarly for £( e \e, N a ). Note also that for some watermarking schemes, we have /Ce^(k, e) = 
ICig\\s.,e). There is then no need to make a distinction and we will denote the probability and the 
effective key length as P(e,N D ) and £(e,N Q ). Additionally, we call £(e, 0) the basic key length, i.e. the 
effective key length of a watermarking system when no observation is available. 

We conclude this section by stating that the size of the seed is the maximum value of the effective key 
length. We assume that the pseudo-random generator is public (Kerckhoff's principle) so that nothing 
prevents the attacker from using this generator. If any different two seeds produce two different secret 
keys, then a brute force attack on the seed yields a key length of the size of the seed. Nevertheless, 
the attacker may work with a different pseudo-random generator. The theoretical study below assumes 
that he uses a perfectly random generator giving K' ~ for N Q - 0, or that he uses K' = g(0 N °) for 
N > 0. In practice, the value of the effective key length should be clipped to the size of the seed in bits. 

IV. Theoretical effective key length computations 

The goal of this section is to compute the expressions of the key length for the most popular class of 
watermarking schemes: additive spread- spectrum. 

A. The equivalent region 

Consider a spread spectrum one-bit watermarking s.t. y = e(x, m, k) = x + (-l) m ak, with m e {0, 1}. 
The host is modeled by a white Gaussian vector of size N v and power a\. The secret key is a pseudo- 
random unitary vector (||k|| = 1) and K, is consequently the unit hyper-sphere. The parameter a controls 
the Document to Watermark power Ratio with the following relation: 

I DWR 

a = y/N v <T X 10~~ Sir - (9) 



February 17, 2012 



DRAFT 



IEEE TRANS. ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. Y, JANUARY 20 1Z 



9 



The decoder is correlation based: d(y, k) = if y T k > 0, 1 else. We assume that y is corrupted by an 
independent white Gaussian noise of power a 2 N . The SER is given by 

T,(a N ) = *\-—^L=) (10) 

\ V4 + 4/ 

with <!>(•) the cumulative distribution function of the standard normal random variable. Eq. (9) and (10) 
show that the robustness of the scheme quantified by ry(<Tjv) is an increasing function of N v . 

The adversary uses the same encoding or decoding functions but with a different key k' with ||k'|| = 1. 
We restrict our attention to the equivalent decoding keys. The reason is that 4?(k,e) = /Cg } (k,e) 
because <i(e(x, m, k'),k) and d(e(x,m,k),k') have identical pdfs. We define by the angle between 
k and k': cosO = k T k'. The adversary's decoding statistic is y T k' ~ Af((-l) m acos6, <j 2 x ) and his SER 
is 

/-acos0\ 

e = $ . (11) 

V crx I 

For a given e > r/(0), k' is an equivalent key if its angle with k is lower than 

9 e = arccos(-$ _1 (e)o-x/a) (12) 

(<J> _1 (e) dwr\ 
-^10^ . (13) 

K. eq (e, k) is the intersection of the unit hypersphere and the single inner hypercone of axis k and angle 
9 e , i.e. a spherical cap. 

B. The basic key length 

For N a = 0, the probability that a key k' uniformly distributed over tC is inside ZC ei? (e,k) is the ratio 
of the solid angle of this spherical cap and the full hypersphere (see Appendix A): 

P^ 1 -^'' 1 ^- 1 ^ (,4> 

where /(•) is the regularized incomplete beta function. Fig. 4 shows that, contrary to 77(0" jv), the basic 
key length is a decreasing function of N v for fixed e and DWR. This illustrates the trade-off between 
security and robustness. Appendix A gives the asymptotical value of the basic key length: 

^im P 55(e ,0) = ^l-erf^L_pi0-jj. (15) 

This means that SS schemes become more robust as N v -> 00 but their basic key length does not vanish 
to 0. 



= arccos 
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C. Key length for N o >0 

For N Q > 0, we suppose without loss of generality that the embedded messages were all set to (if 
not, we work with (-l) m \yj). One possible estimator k is to compute the average of {yi}^ = \ and to 
normalize it. The probability of this estimation being inside K, eq (e, k) is approximated by the cumulative 
distribution function of a non-central F-distribution variable of degrees of freedom v\ = 1, v-i = N v - 1 
and noncentrality parameter A = a 2 ^r, weighted by the probability P [k' T k > 0] (see Appendix A): 

Pss(e,N o) » [1-F( ^1^ ;1,N V -1,X)] 

*$(VA). (16) 

The experimental work below shows that this approximation is indeed very accurately in our setup. 

V. Practical effective key length computations 

Depending of the watermarking scheme, the effective key length defined by (8) may not have a literal 
formula and this section aims at giving an experimental setup for its estimation. We first propose a general 
framework with a high complexity. For the case of additive spread spectrum, some simplifications occur 
and stems into a more practical experimental setup. 

A. The general framework 

If we are not limited in term of computational power, the probability P^ (e, N a ) can be approximated 
using a classical Monte-Carlo method. We first generate a set of Ai random secret keys {kj}^. For 
each of them, we also generate N2 test keys {k^}^. Then, an estimation is: 

1 ATj N 2 

iVi7V 2 i=lj=1 

where 

^)(k^,e) =1 if k^e/cg^e) 
= else. 

The probability p( e \e,N ) is respectively approximated using the indicator function u^{-) of K,^ e \ 

For N Q = 0, each test key k^ is independently drawn according to pk- For N Q > 0, we first generate 
a set of N Q observations O^" depending on kj, and we resort to a specific estimator to construct k^ = 
£r(Of°) (see Sec. III). 

Secondly, the equivalent region may not have a defined indicator function. In this case, we generate 
N t other contents {y^}^\ watermarked with kj (resp. original contents) and the test is satisfied if at 
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least (l-e)iV t contents are correctly decoded (respectively embedded) using k^. Mathematically, for 
the decoding equivalence: 

e /Cg)(k,,e) « |{y, e V m XK,j))\ > (1 " O^t- (19) 

In this case an estimation of P^ d \e J N Q ) needs Ni(N2N +N t ) embeddings and N1N2N decodings. Due 
to the limitation of the Monte-Carlo method, N1N2 should be in the order of l/P( d \e,N ) for having 
a meaningful relative variance of the estimation. The parameter N t should also be quite big for having 
a good approximation of the indicator function of /C^?(kj,e). It is reasonable to take N t = 0(c Nv ) for 
some constant c where N v is the dimension of the space X containing T> rne {k! i ■). 

This procedure is generic and it blindly resorts to the embedding and the decoding algorithms as black 
boxes. If we have some knowledge about the watermarking technique, some tricks reduce the complexity 
of the estimation. First, the probability of finding an equivalent key might not depend on kj, so that 
we can restrict to iVi = 1 original key. This is the case for spread spectrum technique. For N Q = 0, the 
probability to be estimated may be very weak and out of reach of the Monte-Carlo method. We can use 
rare event probability estimator such as the one proposed in [17]. Last but not least, for a given k^ •, the 
geometry of T> m {k.' i -) can help reducing Nt and still obtaining a good approximation of the indicator 
function of /Ce^(kj, e). The following subsections put into practice these simplifications for the additive 
spread spectrum technique. 

B. Approximation of the equivalent region K.^ 

The equivalent region K^q depends on the embedding and decoding. For the additive spread spectrum, 
both processes are so simple that we were able to derive closed-form formula of the probability in Sect. IV. 
We suppose now that the embedding is more complex which prevents theoretical derivations. We will 
pretend in Sect. VI that the Improved Spread Spectrum proposed in [18] plays the role of such an 
embedding. 

For a given host x, we can always express the result of the embedding as 

y = e(x, m,k) = a(x, m)k + 6(x, m)u ± (x, m), (20) 

where k T u 1 (x,m) = 0. The decoding with k' is based on the quantity: 

y T k' = a(x,m) cos(0) + 6(x,m).(k' T u x (x,m)), (21) 

whose sign yields the decoded bit m. It is important to note that the decoding step using a test key k' can 
be performed in a 2 dimensional space spanned by (k, u ± (x,m)). The Symbol Error Rate is expressed 
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in term of the CDF of the statistical r.v. Y T k' which depends on 9, and is thus denoted SER(#). For 
9 = 0, we have SER(O) = r/(0). For e > r/(0), we define 

9 e = max 9. (22) 

SER(6»)=e 

This shows that the equivalent decoding region is a hypercone of axis k and angle 9 e which depends on 
the embedding. The only thing we need is to experimentally estimate angle 9 t . Then, we use Eq. (14) 
in order to obtain an approximation of the effective key length. 

The estimation of 9 € is made under the following rationale. A vector y watermarked by k with m - 1 
is correctly decoded by any k' s.t. k' T k > cos(# e ) if its angle 4> with k is such that <p a [9 e -n/2, 9 e + ir/2] 
(see Fig. 3). In practice, we generate N t contents {yj}^\ watermarked with m = 1, and we compute 
their angles {4>i}^\ with k. Once sorted in increasing order, we iteratively find the angle m ; n such that 
int((l - e)N t ) vectors have their angle cf> a [4> m - m - ir/2, 4> m - m + tt/2] and set 9 e = tt/2 - 4> m - m . 

A much lower number of vectors is needed to accurately estimate one parameter than a full region 
of the space. N t and N v directly impact the accuracy of 9 e , but since this boils down to the estimation 
of a single parameter, the magnitude of N t is rather low in comparison with the effective key length. 
For example, at N v = 60 and DWR = 10 dB, we generate N t = 10 6 contents in order to obtain a reliable 
effective key length of more than 100 bits, whereas an estimation based on (19) would have required 
N t w 2 e x 10 3 w 10 33 contents. Moreover, the angle 9 e is the same for any k, so the estimation is done 
only once. This avoids the counting of correct decodings over Nt vectors of (19). 

C. Rare event probability estimator 

A fast rare event probability estimator 1 is described in [19]. We explain its application for the case 
N Q = 0. This algorithm estimates the probability P [s(K') > 0] under K' ~ j>k- It needs three ingredients: 
the generation of test keys distributed according to px> the distribution invariant modification of test 
keys, and the soft score function s(-). 

We work with an auxiliary random vector W ~ A/"(0, Ijv„)- The generator draws W and outputs 
a test key K' = W/||W||. Since the distribution of W is isotropic, K' is uniformly distributed over 
the hypersphere. The algorithm draws n such test keys, and iteratively modifies those having a low 
score. The modification takes back W, adds an independent noise N ~ jV(0,Ijv„), and scales the result: 
W = (W + fiN)/y/l + n 2 . Parameter /j, controls the strength of the modification. In the end, it returns 

'available as a Matlab toolbox at www.irisa.fr/texmex/people/furon/src.html 
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Fig. 3. Projections of N t = 5000 watermarked vectors (N v = 60, m = 1) on k and u ± , DWR = 10 dB, N v = 60, e = 10 2 . The 
vector k[ nax correctly decodes [(1 - e)N t ] contents. 



a new test key W'/||W'||. For any value of \i, the modification lets the distribution invariant because 

W ~ M(0,In v )- The properties of this algorithm depends on n as given in [19]. Qualitatively, the bigger 

n is, the more accurate but slower is this estimator. 

We propose two score functions depending on whether we know the equivalent region K,^ : 

1 ) K}eq is known (Sect. TV) or approximated (Sect. V-B): the score function is simply a metric between 

the test key and the border of the equivalent region: s(K') = K' T k - cos(# e ). In the end, the algorithm 
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returns an estimation of P[cos(#) > cos(6> e )] when K' is uniformly distributed over the hypersphere. 

2) fC\^ is not known: We generate N t contents {yi}^ = \ watermarked with k, and the score function 
is the int(eA^)-th smallest 'distance' from these vectors to the set V m (k'), where int(.) denotes the 
closest integer function. For SS or ISS, this 'distance' is for instance the correlation k' T y. In the end, the 
algorithm returns an estimation that int((l -e)N t ) vectors are correctly decoded, when K' is uniformly 
distributed over the hypersphere. 

VI. Results and Discussions 

The goal of the experimental part is twofold. First, we wish to assess the soundness of the experimental 
measurement of the effective key length. This is done by a comparison to the theoretical results for 
the additive Spread Spectrum. Second, we would like to illustrate the trade-off between security and 
robustness. Third, we compare the additive Spread Spectrum (SS) to the Improved Spread Spectrum 
(ISS) [18]. 

In the latter method, the embedding has two parameters (/3, 7): e(x, m, k) = x+ (-l) m (/3-7(x T k))k. 
For a fair comparison, the parameters N v , e, ax and DWR are fixed. This implies that 

0000 o DWR 

a 2 = f3 2 + j 2 a 2 x = N v a 2 x W-— . (23) 

The robustness is gauged by using a AWGN channel of variance a 2 N giving a Watermark to Noise Ratio 
WNR = 101og 10 (<T 2 F /cj^ r ) dB. As for the security, we use N t = 10 6 contents to estimate 9 e for N Q = as 
explained in Sect. V-B. The two embedding functions, SS and ISS, produce different angles. Then, the 
rare event probability estimator is used as described in Sect. V-C with n = 80. For N Q > 0, the attacker's 
key estimator g{-) is just the normalized average of vectors {(-l) mi yi}^ as explained in App. A. It 
appears that the probabilities to be estimated are dramatically bigger, and the Monte Carlo method of 
Sect. V-A is good enough. 

A. The impact of embedding parameters N v and DWR 

Fig. 4 points out the decrease of the basic key length w.r.t. N v for a constant embedding distortion. 
Contrary to a statement of [15, Sec. 4.1], the effective key length is not proportional to N v . We can also 
note the relatively fast convergence to the strictly positive asymptote (15), especially at high embedding 
distortions. Fig. 5 highlights the decrease of this asymptotic key length with the embedding distortion. 
The basic key length is computationally significant, say above 64 bits, only for DWR greater than 12 dB 
for e = 0.01. If the watermarking technique is such that a lower DWR remains imperceptible, it should 
not be recommended from a security point of view. 



February 17, 2012 



DRAFT 



IEEE TRANS. ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. Y, JANUARY 20 1Z 15 




I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

50 100 150 200 250 300 350 

N v 

Fig. 4. The basic key lengths for e = 1CT 2 and DWR 6 {8, 10, 12} using the theoretical expression (14) (plain lines), estimation 
of the equivalent region presented in Sect. V-B with N t = 10 6 (o) and rare event analysis presented in Sect. V-C2 (*) with 
Nt = 5.10 4 and n = 80. The horizontal dotted lines are the asymptotes (15). 

B. The impact of security parameters e and N Q 

The decrease of the basic key length with e is confirmed on Fig. 5. This is not a surprise: the more 
stringent the access to the watermarking channel, the higher the security is. 

Fig. 6 and Fig. 7 illustrate the dramatical decrease of the effective key length when observations are 
available in the KMA context. For example, at DWR = 10 dB, N v = 300 and e = 10" 2 , the effective key 
length drops from roughly 50 bits to 8 bits for N a = 1 and nearly bits for 10 observations. In brief, 
SS watermarking is not secure if the attacker gets observations. Note also that the approximation (16) is 
very close to the Monte Carlo estimations. 

1) The interplay between security and robustness: Fig. 8 shows the trade-off between robustness 
measured by ry(0) and security gauged by £. For a given robustness, the longer the host, the better the 
security and the smaller the distortion of the scheme. Conversely, to decrease r/(0) while keeping the 
basic key length constant, it is better to increase N v than to increase the distortion. This is due to the 
fact that the effective key length decreases to a strictly positive value w.r.t. N v but on the other hand 
decreases to zero w.r.t. the embedding distortion. Fig. 8 highlights that £ and N v both decrease w.r.t the 
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Fig. 5. Basic key length for hosts of infinite length given in (15). 

distortion at a constant robustness, as already suggested by Fig. 5. 

We now compare SS with ISS regarding both security and robustness. Fig. 10 shows that the host 
rejection parameter A has a negative impact on the key length and a mitigated positive impact on the 
robustness. At low WNR regimes, two different A may give the same robustness but two different effective 
key lengths. One should consequently choose the A parameter maximizing the security in this case. 

2 ) The validity of the practical approaches: The practical methods (Monte-Carlo, rare-event estimator 
or equivalent region estimation) match the literal formula (14) and (16) either for small or large effective 
key lengths on Figures 4, 6 and 7. The rare event estimator (Sect. V-C2) and the estimator based on 9 e 
(Sect. V-B) are particularly accurate for large key lengths (see Figures 4 and 10), whereas the Monte-Carlo 
estimator is more efficient for small key length (see Figures 6 and 7). 



In this paper, we have proposed a new measure called the effective key length to characterize wa- 
termarking security. Contrary to symmetric cryptography, there are several keys granting to access to 
the watermarking channel. This gives birth to the notion of equivalent keys. The effective key length 
represents the difficulty of finding such a key. 



VII. Conclusion 
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200 400 600 800 1,000 

N v 

Fig. 6. Key lengths for e = 1CT 2 , N a = 1, and different DWR using approximation (16) and Monte-Carlo simulations of 
Sect. V-A (o) with N x = 1 and N 2 = 10 6 . 

We have computed the effective key length theoretically and practically for additive spread spectrum 
schemes. The main conclusions of this specific analysis are the following. For a constant error rate against 
the AWGN channel, the effective key length increases w.r.t. the length of the host and decreases w.r.t. 
the distortion. Contrary to what was stated in [15], the effective key length is not proportional to the size 
of the host. The decrease of the effective key length is dramatic regarding the number of observations 
in the KMA context, which strongly supports the idea of changing the embedding key as frequently as 
possible. 

Our future work will apply this methodology to other watermarking schemes (such as DC-QIM) but 
also to other scenario attacks such as the Oracle attack. 

Appendix A 
Probabilities for Spread Spectrum 

Let X ~ Af(fiei,a 2 lN v ), where ei is the first vector of the canonical basis of R Nv . The appendix 
gives the probability that the normalized correlation D = X T ei/||X|| is above a threshold r. A simpler 
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Fig. 7. Key lengths for e = 10 , N = 10, and different DWR using approximation (16) and Monte-Carlo simulations of 
Sect. V-A (o) with TVi = 1 and N 2 = 10 6 . 



problem is the computation of: 

P[D 2 >t 2 ] 



x\ 



i=l A i 

x\ 



> T 



1-T 2 



x\ 



(N v -l)~^Z 2 Xf 



(Ny - l)r 2 

1-T 2 



(24) 
(25) 
(26) 



Denote F = 



»„ For |tx = 0, F is the ratio of two independent \ 2 random variables of degree 
of freedom v\ = 1 and = N v - 1, thus it is distributed as a Snedecor F-distribution F(1,N V - 1) [20, 
26.6], whose CDF is given by a regularized incomplete beta function I + * _ (1/2, (iV„ - l)/2), and 

P[7J 2 >r 2 ] = l-/ T2 (l/2,(JV t) -l)/2). (27) 

This is the probability that a centered white Gaussian vector lies inside a two-nappe hypercone of angle 
arccos(r). By symmetry around the origin, we have for the single nappe hypercone P [D > r] = (1 - 
I T 2 (1/2, (N v - l)/2))/2. This holds indeed for any random vector X whose distribution is symmetric 
wrt to the origin, and in particular for a uniform distribution over the hypersphere. This proves (14) if 
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Fig. 8. Trade-off between robustness and security. The plot is computed by varying DWR ; the ticks show the values of DWR 
for 77(0) = 1(T 7 . 



one sets k = ei and r = cos(# e ). Another point is that as N v -*■ oo, the distribution of F converges to 
a xl distribution [20, 26.6.11] while the RHS of the inequality in (26) converges to k 2 if r = k/s/N^. 
Therefore, lim^^ P [D > r] = (1 - erf (|k|/v^))/2. This proves (15) because cosd e = n/^Ny due 
to (13). 

For fi > 0, F has a non-central F-distribution with noncentrality parameter A = /U 2 /o" 2 and degrees of 
freedom v\ = 1 and V2 = N v - 1, whose CDF is denoted by F(x; 1, N v - 1, A). Therefore, 

F[D 2 >T 2 ] = l-F( (N l2ll T2 -,l,N v -l,\\. (28) 

However, the argument of symmetry no longer holds for deriving P [D > r]. We propose to write: 

P[D>t] = ¥[(D 2 >r 2 )k(D>0)] (29) 

= P [D 2 > t 2 \D > 0] .P [D > 0] (30) 

« F[D 2 > r 2 ] .P [D > 0] , (31) 

with P [D > 0] = < I>(^/A). This approximation is accurate for A -> and A -»■ +oo. 
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Fig. 9. Trade-off between robustness and security. The plot is computed by varying N v ; the ticks show the values of N v for 
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The link with Spread-Spectrum for N Q > is the following. The attacker estimates the secret key as 
K = Y/||Y|| with Y the average of the observations: 



1 



No 



No 



Y = — VY, = ak+ — VXi = ak + X. 

_/y ^ 1 at ^— ' 



o i=l 



Natl 



(32) 



If we assume that the hosts are independent white Gaussian vectors, then X ~ Af(0, j^I)- Now, K is 
an equivalent key (ie. it belongs to the spherical cap) iff Y belongs to the inner single-nappe hypercone: 
D = k T Y/||Y|| > cos(0 e ), which translates into 



D 



> r = cos(c? e ), 



(33) 



where U = (Ui,---,Un v ) is the projection of X on a basis of R Nv , whose first vector is k, divided by 
o\\N so that U ~ JV(0,I). After the transformation that turns D into the r.v. F, it appears that this 
latter has a noncentral F-distribution with a noncentrality parameter 

n , o DWR 

\ = a 2 N /a 2 x = N v N o .l0- — . (34) 



This provides the approximation (16) in the text. 
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In the same way as above, F converges to a non central x\ modelled as (Ui + \f\) 2 when N v oo. 
This makes P [/J 2 > r 2 ] -»■ P [(U\ + v^A) 2 > k 2 ]. Parameter A linearly increases with N v as shown in (34). 
Inspired by [21, Proof of Lemma 2.1], we write: 

p[(J7i + \/A) 2 >k 2 ] - p[?7 1 2 + A 2 + 2[/ 1 \/A>k 2 ] 



2 y2 1 



A->oo 



2 2^A 

and so does <J>(\/A). In the end, lim^^^ P [D > t] = 1 which shows that the effective key length vanishes 
to zero as N v -> oo provided that iV > 0. 
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Fig. 10. Trade-off between robustness and security for ISS. The plot is computed by varying A at DWR = 10 dB, jV„ = 80, 
and e = 10~ 2 . The key length is estimated using the method of Sect. V-B (o) with N t = 10 6 and the rare events estimator of 
Sect. V-C2 with N t = 5.10 4 and n = 80 (*). 
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